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ABSTRACT 


Conventionally programmed digital computers can process numbers with great speed 
and precision, but do not easily recognize patterns or imprecise or contradictory data. In- 
stead of being programmed in the conventional sense, artificial neural networks are capable 
of self-learning through exposure to repeated examples. However, the training of an ANN 
can be a time consuming and unpredictable process. 

A general method is being developed by the author to mate the adaptability of the ANN 
with the speed and precision of the digital computer. This method has been successful in 
building feedforward networks that can approximate functions and their partial derivatives 
from examples in a single iteration. The general method also allows the formation of 
feedforward networks that can approximate the solution to nonlinear ordinary and partial 
differential equations to desired accuracy without the need of examples. It is believed that 
continued research will produce artificial neural networks that can be used with confidence 
in practical scientific computing and engineering applications. 
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INTRODUCTION 


Neural networks have proven to be versatile tools for accomplishing what could be termed 
higher order tasks such as pattern recognition, classification, and visual processing. How- 
ever, conventional wisdom has held that networks are unsuited for use in more purely 
computational tasks, such as mathematical modelling and physical analysis of engineering 
systems. Certainly the biological underpinnings of the neural network concept suggest that 
networks would perform best at tasks at which biological systems excel, and worse or not 
at all at other tasks. 

Contrary to popular opinion the author believes that continued research into the ap- 
proximation capabilities of networks will enable the neural network paradigm, with all of 
its advantages in behavior and adaptability, to be mated to the more purely computational 
paradigms of mathematically oriented scientific programming and analysis. Additionally, 
it is felt that the thorough investigation of network approximation capabilities will benefit 
the network field and connectionism in general. 

In a field as conceptually difficult as the study of artificial neural networks, it is best to 
start investigation with supervised learning, test the established premises, and alter them 
to circumvent pitfalls in implimentation. 

FUNCTION APPROXIMATION 
Learning as Function Approximation 

Central to the author’s research approach is the view that supervised learning in artificial 
neural networks is equivalent to the problem of approximating a multivariate function and 
that learning should be able to be explained by approximation theory. Approximation the- 
ory deals with the problem of approximating or interpolating a multivariate function. This 
approach has been considered by other researchers in the field of ANNs [l]-[4j. However, 
the author extends this assumption of function approximation by assuming that ANNs 
can model discontinuous multivariate functions and should be at least as accurate and 
numerically efficient as existing computational techniques used in science and engineering. 
Also, ANN behavior and adaptation difficulties, from supervised learning to machine vision, 
should be amenable to the standard error analysis techniques used in numerical analysis 
[5]. 

Function Approximation In Engineering 

There are three classes of tools used in science and engineering for the analysis of systems: 

1. analytical methods, which include the formation of equations that model the be- 
havior of systems and the analytic solution of those equations. 

2. computational methods, which involve the simulation of system behavior by the 
numerical solution of the governing equations. 

3. experiments, which involve the investigation of physical phenomena and the gath- 
ering of data to validate analytical models and numerical simulations. 
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In a general sense analytical and computational methods and experiments can be con- 
sidered to be forms of function approximation. The governing equations derived from ana- 
lytical methods are a compact representation of the functions that model some particular 
phenomena observed in experiments. Computational techniques are used to approximate 
the function or functions that satisfy the governing equations. The graphs and tables made 
from experiments are representations of the functions that underlie observed physical phe- 
nomena. 

Computational Methods 

In the wake of the computer revolution in scientific applications, a large number of com- 
putational techniques have emerged. Also, particular methods have assumed prominent 
positions in certain areas of application. For example, finite element methods are used 
almost exclusively for solving structured problems; spectral methods are becoming the pre- 
ferred approach to global atmospheric modelling; and the use of finite difference methods 
is nearly universal in simulating fluid and thermal systems. 

Each computational method has its own set of advantages and disadvantages depending 
on the characteristic of the application. These popular and apparently unrelated techniques 
are firmly entrenched in computer codes used every day by practicing scientists and engi- 
neers. Often the formal numerical training provided the scientist and engineer reinforces 
the divisions between the various computational methods available. However, Fletcher (6] 
has demonstrated that each of these numerical methods are in fact particular aspects of a 
more general approach known as the method of weighted residuals [7]. 

PROGRAMMABLE ARTIFICIAL NEURAL NETWORKS 

It is the objective of the author’s research program to demonstrate that artificial neural net- 
work behavior from supervised learning to machine vision can be derived from the method 
of weighted residuals. This would link ANNs with the relatively mature and established 
field of computational mechanics, extend ANN capabilities, and help in transforming ANN 
applications from an art to a science. This may also advance research in our understanding 
of biological neural systems. 

If we are to assume that ANNs are as valid as established computational techniques, 
then ANNs should be evaluated in the same manner as are computational techniques. 

The first step in evaluating the capabilities of a new numerical method, is to apply it to 
the solution of algebraic and ordinary and partial differential equations of known behavior. 
This same approach cam be used for ANNs since the solution of algebraic and differential 
equations can be viewed as the approximation of a function that must satisfy the equation 
in question subjected to boundary and/or initial conditions. 

Applying an ANN to the solution of an algebraic or differential equation effectively 
uncouples the influences of the quality of data samples, network architecture, and transfer 
functions from the network approximation performance. The solution of equations also 
allows us to study the influence of constraining the connection weights. The most immediate 
benefit in this approach would be the construction of networks that can approximate the 
solution to desired equations without the need for examples. This would be of value in 
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engineering applications since considerable effort may be saved if the equations governing a 
physical process can be directly incorporated into the neural network architecture without 
the need of examples, thereby shortening or even eliminating the learning phase. 

This approach may also lead to the construction of network training routines that are 
faster and more accurate than those presently in use [8], [9]. In addition, progress made in 
this network programming approach should provide the research community insight into the 
working of networks for associative memory, classification, and machine vision applications. 

Approach 

The MWR approach has been taken by the author using the hard limit [10] as the transfer 
function. Interesting results have been produced and are presented to demonstrate the 
validity of the approach. 

It can be argued that the supervised training of a feedforward network is a problem in 
function approximation using unconstrained optimization. In this sense, the task of the 
optimization scheme is to find the proper combination of connection weights between the 
processing elements, operating with specific transfer functions, so that the network mini- 
mizes the error between the network output find the desired output. Therefore, the training 
of a network possesses all of the problems one associates with unconstrained optimization 
such as avoiding local minima in search of the global minim um 

The most obvious remedy to this problem is to constrain the optimization while pre- 
serving the approximation capability of the network. Our task then is to form constraints 
between the weights so that the values of the weights may be determined with computa- 
tional efficiency. 

Constraint of Weights Between the Input and Hidden Layer 

A univariate function u(x) can be represented by a feedforward network (Fig. 1) with a 
single hidden layer, and a single input and output node using a linear transfer function, as 
follows: 

N 

«(*)■*£(**(&) + «<)«»,•, (i = <*iX + Oi for i = l,...,N (1) 

t=l 

Each hidden processing element is indexed by the subscript i, where N is the total number 
of hidden processing elements. The variable x is the value of the input and represents 
the nonlinear transfer function for the i th hidden processing element. The coefficients u>j, 
Qi and $i are the values of the connection weights between the hidden and output layers, 
the input and hidden layers, and the bias node and hidden layer, respectively. The role 
of the remaining set of coefficients, s„ will be explained shortly. Equation (1) indicates 
that we must determine the value of 4 N coefficients to approximate a function with the 
feedforward architecture. Our objective in constraining the weights is to decrease the 
number of unknown coefficients. 

The formulation of Eq. (1) is similar to that given by Cybenko [11]. However, & will be 
modified in a manner similar to the radial basis technique. Notice that to provide extended 
dynamic range, this formulation assumes that the input and output processing elements 
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use linear transfer functions. However, this formulation also allows the use of nonlinear 
transfer functions in the output node. 

We will constrain the weights 0, in the following manner. Discretize the domain (0) of 
the input variable into N - 1 intervals. Each interval is bracketed by the values x, and x,+i 
where x< < x,+i and i = 1, ...,N . The value of N is equal to the total number of hidden 
processing elements. We will use the following equation to constrain 0,: 

6i = —ai&i for i = 

so that 

l i = a, (x - Xj) for i = 1, ..., N. 

Notice that by constraining eaw:h bias weight (0<) in this manner, & = 0 for the i th processing 
element when x = x,. The variables x, are similar to the “centers” used in the radial basis 
function literature [12], [13]. 

For our analysis we will use a piecewise continuous polynomial approximation to the 
hyperbolic tangent (Eq. (2)) known as the hard limit which is illustrated in Fig. 2. 

$,'(&) = £, — 1 where & = f° r x, < x < x,+i 

^i*+l 


$,(£,-) = — 1 for x < x, aind $*(£;) = +1 for x, + i < x. (2) 

Therefore, for the piecewise polynomial transfer function we cam constrain the input 
weights a; using Eq. (2). 

2 

a, = — for Xj < x < Xi+i 

®«+i — 

Notice then, that transfer functions from each processing element are distributed along the 
x axis in the domain of interest fi. Each function is “centered” at the respective values of 
x. (Fig. 3). 

Equation (1) cam now be seen as a weighted sum representation of the function u(x). 
One fan think of the transfer functions as being interpolation functions distributed along 
the transformed x axis. Each interpolation function, and its coefficient Sj, is multiplied by 
its respective weight, ti><, and summed to approximate a desired curve. 


Constraint of Weights Between the Hidden and Output Layer 

It cam now be shown that the role of the coefficients s, in Eq. (1) is to add or subtract 
constant values from the respective transfer function. This effectively moves the transfer 
function above or below the x axis (Fig. 4). Equation (1) may be rewritten as: 

N N 

u ( x ) =: 53 (6) w i + a where a = J3 3 * w i ( 3 ) 

.=i «=i 
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The coefficient <r acts as the connection weight between the second bias and the output 
node (Fig. 1). 

One final constraint we impose on the output weights in that u>, = — ti>< +1 . This requires 
that we use an even number of processing elements. The final constraint acts to convert 
the hard limits from global interpolation functions into local interpolation functions. 

We have now constrained all of the parameters for the single input and single output 
feedforward network and have decreased the number of unknowns from 4 N (a,-, 0,, s,, and 
tt\) to N/2 + 1 (w( and a). What remains now is to determine the output weights and the 
second bias weight so as to approximate not only the desired functional relationship, but 
also the derivatives of the function. This is done using the method of weighted residuals 
(MWR). More specifically, the Bubnov-Galerkin and Petrov-Galerkin methods. 

ORDINARY DIFFERENTIAL EQUATIONS 
Determination of Output Weights: Method of Weighted Residuals 

To illustrate the method we will determine the output weights and bias (to, and a) needed 
to approximate the solution to a linear first order differential equation, using only the 
equation and the initial condition [14]. 

A feedforward network of one input and output node and a single hidden layer is 
constructed to approximate the solution to the follow ing equation: 


du 

s-“=° 


with the boundary condition ti(0) = 0. The exact solution is: 


(4) 


Uexuct = e* 


Substituting Eq. (1) into Eq. (4) we have: 


du 

dx 


u 



gw.) 

dx 


Wi 


N 


- £ + <r = 0 . 

1=1 


(5) 


In this example we will set a to zero. Notice that Eq. (5) is only satisfied if the 
nontrivial set u>, is exactly correct. 

If we use random values for tt>, the right hand side would be nonzero and act as a 
measure of the error. This error is also known as the equation residual (e). We may obtain 
an acceptable approximation if we force the weighted residual to zero over the domain of 
interest, ft. 


fk(x) (c ) dx = 0 = jf /*(x) ~ ) dx (6) 

where k = 1, ..., N and f k {x) is referred to as the weighting function or test function. The 
approach shown by Eq. (6) is known as the method of weighted residuals. 
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Since linearly independent relationships are needed to solve for the coefficients w;, it is 
clear that /* must be a set of linearly independent functions. The choice of the weighting 
functions correspond to different solution techniques such as the subdomain, collocation, 
least square, and the Galerkin methods [6]. For this investigation we will use a modification 
of the Galerkin approach or more specifically the Bubnov-Galerkin method [15]. This 
approach requires that fk be chosen from the same family of functions as the transfer 
functions, that is, 

f k {x) = $k{(k) for k=\,...,N. (7) 

Except for the change in the index from i to k, $jt(£jt) is described by the hat function. 

So Eq. (6) may be rewritten as 

E f iN dx Wi - E r *k((k)H(i) dx Wi ; = 0 for k = 1, ..., N. (8) 

»=1 *'*1 i=l 

This forms the linear algebraic system of equations 

N 

E A kiU>i = gk for k = l y ...,N. (9) 

i=i 

In its present form A k i of Eq. (9) is singular and must be modified by the initial 
condition. The initial condition may be written as 

E$* (ii)tn, = gi = 1. (10) 

i=i 

The weights u>; can then be evaluated directly from the solution of Eq. (9). Notice that 
the initial condition could have been partially satisfied by u\ and the value of a would have 
been determined to satisfy the remainder (i.e. g\ = 0.5 and a = 0.5) 

Comparison of the network approximation with the exact solution of Eq. (4) is shown 
in Fig. 5 for forty two hidden processing elements (twenty one output weights). 


Example: Third Order Nonlinear Ordinary Differential Equation 

As a further demonstration of the approximation capability of the network, a feedforward 
network of one input and output node, and a single hidden layer, was constructed to 
approximate the solution to the nonlinear third order ordinary differential equation known 
as the Blasius equation [17]. The Blasius equation is used to describe the steady and 
laminar two-dimensional flow of a viscous Newtonian fluid about a flat plate: 


n 

drj 3 



= 0 


( 11 ) 


with boundary conditions 

m = |«» = o. {h -*-) = > 

The network approximation (Fig. 6), using fifty one output weights, is compared against 
a fifth order Runge-Kutta solver (finite difference) of variable step size that satisfied the 
boundary conditions with an absolute error value of 10 -6 . Figure 7 illustrates the rate of 
convergence of the network using the L2 norm of the error and the interval spacing h. 
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PARTIAL DIFFERENTIAL EQUATIONS 

The author has successfully programmed Higher Order Networks, also known as Sigma-Pi 
networks, to approximate the solution of a partial differential equation [16]. 

Example: Linear Elliptic Partial Differential Equation 

A Sigma-Pi network of two inputs, one output node, and a single hidden layer has been 
constructed to approximate the solution to the linear elliptic equation that models fully 
developed steady flow of viscous Newtonian fluid through a duct of square cross-section: 


d 2 u d 2 u 

d ^ 2 + dy 2+1 ' 0 = 00 


( 12 ) 


where u is the nondimensionalized velocity of the flow along the duct. 

The domain (Cl) of the problem is: —1.0 < x < 1.0, —1.0 < y < 1.0 
Boundary conditions: u = 0 along the perimeter T 

Figures 8 and 9 show the surfaces made by the exact solution and the network solution 
of Eq. (12) for two thousand five hundred output weights. Figure 10 illustrates the rate of 
convergence of the network using the root mean square of the error and the interval spacing 
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Bias 



Hidden Nodes 

0,- : Bias weight for the f* hidden node, 
a,- : Input weight for the i* hidden node. 

: Output weight for the Chidden node. 

0 : Bias weight for the output node. 

Figure 1: Feedforward Network Architecture 



Figure 2: Piecewise Polynomial Transfer Function 


o 



Figure 3: Distribution of Transfer Functions 
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Figure 4: Transfer Function Offset 



Figure 5: Comparison of Exact Solution and Network Approximation of Eq. (4) for 21 
output weights 



Figure 6: Comparison of Runge-Kutta Solution and Network Approximation of Eq. (11) 
for 51 output weights 
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Hard Limit Solution For Poisson*! Equation 



Figure 9: Network Solution of Eq. (12) for 2500 output weights 



Figure 10: Convergence Rate of Network for Eq. (12) 


21-14 



